Non-parametric Power-law Data Clustering
نویسندگان
چکیده
It has always been a great challenge for clustering algorithms to automatically determine the cluster numbers according to the distribution of datasets. Several approaches have been proposed to address this issue, including the recent promising work which incorporate Bayesian Nonparametrics into the k-means clustering procedure. This approach shows simplicity in implementation and solidity in theory, while it also provides a feasible way to inference in large scale datasets. However, several problems remains unsolved in this pioneering work, including the power-law data applicability, mechanism to merge centers to avoid the over-fitting problem, clustering order problem, e.t.c.. To address these issues, the Pitman-Yor Process based k-means (namely pyp-means) is proposed in this paper. Taking advantage of the Pitman-Yor Process, pyp-means treats clusters differently by dynamically and adaptively changing the threshold to guarantee the generation of power-law clustering results. Also, one center agglomeration procedure is integrated into the implementation to be able to merge small but close clusters and then adaptively determine the cluster number. With more discussion on the clustering order, the convergence proof, complexity analysis and extension to spectral clustering, our approach is compared with traditional clustering algorithm and variational inference methods. The advantages and properties of pyp-means are validated by experiments on both synthetic datasets and real world datasets. Keywords-Bayesian Non-parametrics; Pitman-Yor Process; power-law data structure; k-means clustering.
منابع مشابه
Estimation of Parameters of the Power-Law-Non-Homogenous Poisson Process in the Case of Exact Failures Data
This expository article shows how the maximum likelihood estimation method and the Newton-Raphson algorithm can be used to estimate the parameters of the power-law Poisson process model used to analyze data from repairable systems .
متن کاملModeling Galaxy Clustering with Cosmological Simulations
I review recent progress in understanding and modeling galaxy clustering in cosmological simulations, with emphasis on models based on high-resolution dissipationless simulations. During the last decade, significant advances in our understanding of abundance and clustering of dark matter halos allowed construction of accurate, quantitative models of galaxy clustering both in linear and non-line...
متن کاملA robust wavelet based profile monitoring and change point detection using S-estimator and clustering
Some quality characteristics are well defined when treated as response variables and are related to some independent variables. This relationship is called a profile. Parametric models, such as linear models, may be used to model profiles. However, in practical applications due to the complexity of many processes it is not usually possible to model a process using parametric models.In these cas...
متن کاملOn Minimally-Parametric Primordial Power Spectrum Reconstruction and the Evidence for a Red Tilt
The latest cosmological data seem to indicate a significant deviation from scale invariance of the primordial power spectrum when parameterized either by a power law or by a spectral index with non-zero “running”. This deviation, by itself, serves as a powerful tool to discriminate among theories for the origin of cosmological structures such as inflationary models. Here, we use a minimally-par...
متن کاملStable clustering, the halo model and non-linear cosmological power spectra
We present the results of a large library of cosmological N-body simulations, using power-law initial spectra. The non-linear evolution of the matter power spectra is compared with the predictions of existing analytic scaling formulae based on the work of Hamilton et al. The scaling approach has assumed that highly non-linear structures obey ‘stable clustering’ and are frozen in proper coordina...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1306.3003 شماره
صفحات -
تاریخ انتشار 2013